arXiv:1509.01383v2 [physics.soc-ph] 6 Nov 2015 


Role of adjacency matrix degeneracy in maximnm-entropy-weighted network models 


O. Sagarra,^ C. J. Perez Vicente,^ and A. Di'az-Guilera^ 

^ Departament de Fisica Fonamental, Universitat de Barcelona, 08028 Barcelona, Spain 

Complex network null models based on entropy maximization are becoming a powerful tool to 
characterize and analyze data from real systems. However, it is not easy to extract good and 
unbiased information from these models: A proper understanding of the nature of the underlying 
events represented in them is crucial. In this paper we emphasize this fact stressing how an accurate 
counting of configurations compatible with given constraints is fundamental to build good null 
models for the case of networks with integer valued adjacency matrices constructed from aggregation 
of one or multiple layers. We show how different assumptions about the elements from which the 
networks are built give rise to distinctively different statistics, even when considering the same 
observables to match those of real data. We illustrate our hndings by applying the formalism to three 
datasets using an open-source software package accompanying the present work and demonstrate 
how such differences are clearly seen when measuring network observables. 


PACS numbers: 64.60.aq,89.75.Fb,89.75.Hc,89.65.-s 

I. INTRODUCTION 

Network science [1] is a prime example of the multiple 
uses that many tools and methodologies extracted from 
traditional physics can have when applied to a variety of 
transdisciplinar problems. 

The advent of the so-called Big Data era given by the 
explosion of ICT technologies is providing researchers 
with unprecedented large datasets in a myriad of differ¬ 
ent fields ranging from biology [2] to urban studies [3], 
including bibliometrics |3], chemical sciences or even 
history [B] to cite a few. The current challenge is to ex¬ 
tract knowledge and useful information from such enor¬ 
mous datasets. A standard approach is based on rep¬ 
resenting them as a graph. Network representation of 
data is specially useful due to its relative simplicity in 
terms of computational effort, visualization [7] and anal¬ 
ysis. However it presents some serious limitations which 
forces to look for innovative methodological tools. 

One way to go beyond simple network representation 
is to generate appropriate null models. They must be 
flexible and reliable enough to compare our original data 
to in the search of statistically relevant patterns. In 
general, this is not a simple task, data processing is 
tricky and subtle in many situations and it may lead 
to wrong conclusions based on a poor understanding of 
the problem under study. A clever strategy to find effi¬ 
cient null models consists on generating randomized in¬ 
stances of a given network while keeping some quanti¬ 
ties constant [3 [5]. This can be done by algorithmic 
randomization of graphs [5] but such a procedure can be 
costly in terms of computational time (specially for dense 
datasets) and programming difficulty. Most importantly, 
most ’’rewiring techniques” do not always generate an 
unbiased sampling of networks [TO] . 

A different approach to this problem has its roots in 
the analogy of networks with classical statistical mechan¬ 
ics systems [TTHT5] , though it was originally proposed by 
sociologists and also by urban planners [16] under the 
name of exponential random graphs m It is based on 


the idea of constructing an ensemble of networks with 
different probabilities of appearance, which on average 
fulfil the considered constraints. The advantage of these 
methods is that they consider the possibility of having 
fluctuations (as usually happens in real data) and their 
derivation is completely analytical. Furthermore, such 
methods provide an easy way of rapidly simulating (and 
averaging) network instances belonging to a given en¬ 
semble. So far in the literature successful development of 
this kind of methodology has been performed for different 
types of monolayered networks [Ml HUSO], directed [14] 
and bipartite structures ED, stochastic block models [12] 
and some multiplex weighted networks [2]. 

Recently, there is a growing interest to study more 
complex mathematical structures [231 El] to account for 
the inherent multi-layered character of some network sys¬ 
tems. This fact calls for the need of developing max¬ 
imum entropy ensembles with a multi-layered perspec¬ 
tive [4], which will help in the analysis of real world 
datasets. This is the main goal of this work. In this pa¬ 
per, we complement previous work on maximum entropy 
weighted networks by considering systems of aggregated 
multiplexes, where we have information about the layered 
structure of the system and the nature of their events, but 
-as usually happens for real data- we only have access to 
its accumulated structure (the sum of weights connecting 
two nodes in each layer for each pair of nodes). We show 
how the role of event and layer degeneracy induce impor¬ 
tant differences in the obtained statistics for each case, 
which recovers the mono-layered studied cases when the 
number of layers is set to unity. We further show that, 
despite the statistics being different, all the cases con¬ 
sidered are examples of maximum likelihood networks of 
the dual problem [25] but yield different expectations for 
network quantities, highlighting the need of choosing an 
appropriate null model for each case study based on the 
weighted character of the networks. 

In section [ll] we present the mathematical framework 
and calculations of degeneracy for maximum entropy net¬ 
works with arbitrary constraints. Section [TTI] extends the 
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calculation to obtain explicit statistics for a very general 
case of constraints for the different cases considered. Fi¬ 
nally, section [IV| presents an application of the model for 
the particular case of fixed strengths on the analysis of 
three real world datasets. Extended mathematical calcu¬ 
lations are provided in the Appendix and details on the 
used datasets, measured quantities and numerical meth¬ 
ods in the Supplementary Material (SM) accompanying 
this work [2S] , including also a package for public use to 
apply the proposed models m- 


II. MAXIMUM ENTROPY CONSTRAINED 
GRAND-CANONICAL NETWORK ENSEMBLES 
WITH INTEGER WEIGHTS 

We consider a representation of a network of N nodes, 
based on an adjacency matrix T composed by positive 
integer valued entries Uj € N which we call occupation 
numbers. Each of these entries accounts for the intensity 
of the interaction between any given pair of nodes i and 
j in the network, measured in terms of discrete events 
(which may be trips between locations in a mobility net¬ 
work or messages between users in social networks for in¬ 
stance) . We study the case of directed networks with self¬ 
loops, albeit the undirected case follows from the deriva¬ 
tion. Our objective is to fully determine the grand canon¬ 
ical ensemble of networks [niiiniisH] which fulfill on av¬ 
erage some Q-\-l given constraints {T = Uj, C,{T)}. 
The total number of events T = tij determines the 
sampling of the network, and is the minimal required 
constraint to consider any ensemble under the present 
framework |20j . In this paper, we examine the problem 
where constraints take the form of linear combinations of 
functions of the individual occupation numbers Uj, 

VgeQ. 

( 1 ) 

To completely determine an ensemble, it is not enough 
to specify the quantities we wish to fix (the constraints 
given by the original data), we must also define the sta¬ 
tistical nature of the events allocated to the occupation 
numbers tij. In other words, we have to count all the 
network instances which give rise to the same particular 
configuration of the adjacency matrix T. This degen¬ 
eracy term D{T) depends solely on the specifics of the 
system one represents, and counts the number of equiv¬ 
alent (micro) states that a particular unique realization 
of the adjacency matrix T can describe. 

Once a grand canonical ensemble is fully constructed, 
the probability to obtain a particular configuration of 
occupation numbers T reads, 

P({6IJ,T) = 

H{{e,},T) = 9TT{T)+Y,0<iC,{T). ( 2 ) 
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The so-called Grand-Canonical partition function Z = 
^{T} must be summed considering all 

the possible configurations of the adjacency matrix T 
one can consider, regardless of whether the proposed con¬ 
straints are met. Such a probability with an exponential 
form m is obtained by maximizing the Shannon en¬ 
tropy S = X]{T} P(T)lnP(T) associated to the ensem¬ 
ble while preserving the Q -I- 1 constraints on average. 

Using Equation Q we reach, 

P{T)=Z-^DiT)Yl4^z.,{Uj) 

. ( 3 ) 

Zy(tij) = ZT = e^. 
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Let us notice that if the degeneracy term factorizes, i.e., 
D{T) oc Ylij Dij{tij) the partition function can be re¬ 
expressed as 

OO 

Z = Zy = Dij{tij)Zrp^ Zijitij) 

ij ij tij=0 

P(T)=nPb(ib)=n , 

( 4 ) 

where the statistics of T are formed by a set of inde¬ 
pendent random variables corresponding to the occupa¬ 
tion numbers {tij}. Whenever one defines the degen¬ 
eracy term and is able to sum the individual partition 
functions then one gets the explicit statistics of the 
occupation numbers. The values of the Lagrange mul¬ 
tipliers {9t, {dq}) associated to the Q -\- 1 constraints 
(which can also be understood as a posterior hidden vari¬ 
ables [21113^) are obtained by solving the so called saddle 
point equations, 

OO 

C, = (C'(T)) = ^(/*^(t.,)) = E E 

ij ij t'ij=0 

t=J2 ■ 

'ij 

( 5 ) 

where {Cq} are the values of the quantities one wants to 
keep fixed (on average) in the ensemble. 

The degeneracy terms are in general subtle to compute 
and to the best of our knowledge, are seldom considered 
in the literature. In order to calculate them, however, 
we need to make considerations about the type of net¬ 
works at study and their elements. In this work, we 
consider systems composed by events that are either dis¬ 
tinguishable or indistinguishable. Additionally, we study 
the general representation of an overlay multiplex net¬ 
work, which is obtained by aggregating several layers of 
a system into a single (integer weighted) adjacency ma¬ 
trix [23l |24l ■ Examples of such networks range from ag¬ 
gregation of transportation layers |31j . networks gener¬ 
ated by accumulation of information over a certain time 
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span such as Origin-Destination matrices [3] , email com¬ 
munications |32| or human contacts [33] and even an ag¬ 
gregation of trading activities in different sectors such as 
the World Trade Network |53] . 

Thus the system under consideration is an aggrega¬ 
tion of M network layers containing the same type of 
events: They can be either a group of layers composed by 
distinguishable (Multi-Edge - ME) or indistinguishable 
(Weighted - W) events or even an aggregation of Binary 
(B) networks. The occupation numbers corresponding to 
layer m are noted but we only have access to informa¬ 
tion about their accumulated value through all the layers, 
i.e. the aggregated occupation numbers 

Finally, the degeneracy term is the product of the 
multiplicity induced by the nature of the events times 
the nature of the layers (which in the only real pos¬ 
sible scenario are always distinguishable) D(T) = 
D{T)EventsD{T)Layers- This last term is computed (for 
each pair of nodes or state ij) by counting the num¬ 
ber of different groupings one can construct by splitting 
tij — (distinguishable or indistinguishable) aggre¬ 

gated events into M different layers respecting the occu¬ 
pation limitation of the considered events: Either only 
one event per layer (Binary network) or an unrestricted 
number (Weighted and Multi-Edge networks). 


Network Type 

D(T)Events 

D(JT'j Layers 

Multi-Edge (ME) 

Tl 

y-f (Em *15)! y-f A/rEm*i5 



Weighted (W) 

1 


Binary Dist. (BD) 

T! 


Binary Indist. (BI) 

1 

n.(Em"i5) 


TABLE I. degeneracy terms corresponding to the elements of 
the system and their layers for each case. 


The resulting degeneracy terms are shown in table |T] 
(see details in Appendix [d|, from which one can see that 
in principle the event degeneracy term does not factor¬ 
ize for the distinguishable cases due to the presence of 
the variable T! = One can nevertheless ob¬ 

tain an effective degeneracy term by substituting it by 
T! (a constant) -as shown in Appendix [pj where a com¬ 
plete discussion of the implications of this substitution 
for the different cases is provided- which leads to results 
fully equivalent to those obtained by performing the ex¬ 
act calculation for the Multi-Edge case with constraints 
of the form ([^. In doing so, two preliminary conclusions 
can be drawn. Firstly, both the distinguishable and indis¬ 
tinguishable binary cases will lead to the same statistics 
since their degeneracy term on events will be constant 
(hence on the remainder of the paper we will omit the 
case BD). Secondly, in all the cases the complete degen¬ 
eracy terms will factorize into state ij independent terms, 
which means that the statistics of the aggregated occu¬ 
pation numbers will be state independent (equation 0). 


III. LINEAR CONSTRAINTS ON 
AGGREGATED OCCUPATION NUMBERS 


To go further in our derivation, we now consider the 
case where the constraints are linear functions of the ag¬ 
gregated occupation numbers, 

= (6) 

m 

Such a case is very generic and includes networks with lo¬ 
cal constraints on nodes [35] , community constraints [18] 
and generalized cost constraints such as distances [36] . 
The case where the constraints depend on both the bi¬ 
nary projection of the occupation numbers and their val¬ 
ues fij{tij) = CgHij + Cg^Q{tij) can be derived from the 
methodology developed here and is analyzed in Appendix 

El 

The individual partition functions can be summed 
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In this case, we have redefined Zij = to ease 

notation. This leads to. 
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z,.7(l-z.,)" (8) 


And we recover well known probability distributions: 
Poisson distribution for the Multi-Edge case |5D] (in¬ 
dependent of the number of layers M\ Negative Bino¬ 
mial for the Weighted case (being the geometric distri¬ 
bution m a special case when M = 1) and Binomial 
distribution for the aggregated Binary case (being the 
Bernoulli distribution m a special case for M = 1). 

The resulting statistics show some important features: 
On the one hand, one sees that albeit the degeneracy 
term changes for Multi-Edge networks for either case of 
a monolayer or a multilayer, the form of the obtained 
statistics does not. This means that it is not possible 
to distinguish a Multi-Edge monolayered network from 
an aggregation of multiple Multi-Edge layers belonging to 
an ensemble with the same constraints. On the other 
hand, the situation for the other cases changes: For mul¬ 
tiplexes the resulting occupation numbers will have dif¬ 
ferent statistics from the monoplex case. This has the 
implication than one could in principle discern the aggre¬ 
gated nature of a network by inspection of their accumu¬ 
lated edge statistics {tij}, provided that one has access 
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to enough realizations of a system and that it belongs to 
the same ensemble (i.e. the system evolves according to 
some given, even if unknown, linear constraints [3] of the 
form in equation ([^). 


Network Type 

(tij) CFi^. 

Domain Zij 

ME 

M Zij M Zij {Mzij)~^ 

[0, oo) 

W 


[0,1) 

B 


[0,oo) 


TABLE II. First and second moment of the considered distri¬ 
butions, together with the relative fluctuations. 


Another important implication of the obtained statis¬ 
tics is the very different interpretations encoded in the 
values Zij. This collection of values is related to the 
constraints originally imposed to the network ensemble 
through the set of Lagrange multipliers {9t, {9q}) (equa¬ 
tions ([^ and ([^) and can be understood as a posteriori 
measures related to the intensity of each node-pair ij. 
These measures encode the correlations between nodes 
imposed by the constrained topology (note that for local 
constraints only at the level of nodes we obtain a fac¬ 
torization (ty) = MziUj). Table [TT| reports the two first 
central moments of each distribution. For the Multi-Edge 
case Zij is both directly mapped to the average occupa¬ 
tion of the considered link ij, (tij) and to its (relative) 
importance in the network [20j. In all the other cases, 
however, Zij relates to a probability of a set of events 
emerging from a given node, to be allocated to a link ij. 
Obviously, as (Uj) grows, Zij grows in all cases, but not 
in the same linear way (in the W case, for instance, Zij 
is bounded to a maximum value of 1). This means that 
while in all cases Zij is related to the importance of a 
given link with respect to the others, the dependency in 
all non ME cases is highly non-linear. Finally, we can 
see that for a large number of layers M ^ T /N"^, the en¬ 
sembles become equivalent to the ME, as the degeneracy 
term on (distinguishable) layers dominates the configu¬ 
ration space of the ensembles. 

The different obtained statistics are highly relevant as 
their marked differences point out at a (regularly over¬ 
looked) problem: Different maximum entropy ensembles 
yield very different statistics for the same considered con¬ 
straints, and hence each dataset needs to be analyzed care¬ 
fully, since the process behind the formation of each net¬ 
work dictates their degeneracy term. Furthermore, all 
the obtained statistics are derived from a maximum en¬ 
tropy methodology, and hence the values Zij obtained 
from ([^ are in all cases maximum likelihood estimates 
for the probability of T to belong to the set of models 
described by equation ^ (see Appendix]^. Thus, any 
of the presented models will be a correct ensemble in a 
maximum likelihood sense [25] for some given constraints, 
and the appropriate choice for each network representa¬ 
tion depends on the system under study, in contrast to 


the interpretation given by |35) . 

This means that if one wants to assess the effects a 
given constraint has on a network constructed from real 
data, one needs to very carefully choose the appropriate 
null model to compare the data to. It is also worth point¬ 
ing out that most of these ensembles are not equivalent to 
a manual rewiring of the networ k 137] (albeit one expects 
small differences, see Appendix O. However, maximum 
entropy models allow for an analytical treatment of the 
problem, and simplify the generation of network samples 
when the considered constraints are increasingly compli¬ 
cated (both at the coding level and at the computational 
one). This has many implications, including the possibil¬ 
ity of computing p values, information theoretic related 
quantities such as ensemble entropies [nHni [n na or 
model likelihoods as well as efficient weighted network 
pruning algorithms [38l [39] • Moreover, this procedure 
helps in the fast and simple generation of samples of net¬ 
works with prescribed constraints. 

The main difficulty of the soft-constrained maximum 
entropy framework hereby presented for null model gen¬ 
eration is the problem of solving the saddle point equa¬ 
tions ([^ associated to each ensemble. With the exception 
of some particular cases M: these equations do not have 
an analytical solution and must be obtained numerically. 
In such a case, the best approach is to maximize the 
associated loglikelihood of each model to a set of obser¬ 
vations (constraints), yet the difficulty of each problem 
increases with the number of constraints since each fixed 
quantity has an associated variable to be solved. Consid¬ 
ering the different statistics obtained in this paper, the 
most difficult case by far is the Weighted one (W), since 
the condition that 0 < zij < 1 imposes a non-convex 
condition in the domain of the loglikelihood function to 
maximize, while the others are in general easily solved 
using iterative balancing algorithms (see SM for an ex¬ 
tended discussion and details on the numerical methods 
used). 


IV. APPLICATION TO REAL DATA: THE 
CASE OF FIXED STRENGTHS 

To highlight the importance of considering an appro¬ 
priate null model for the assessment of real data features, 
in this final section we consider the case of networks with 
a fixed strength sequence. Real networks usually display 
highly skewed node strength distributions, which have 
important effects in their observables. Hence, to cor¬ 
rectly assess whether some observed feature in a dataset 
can be solely explained by the strength distribution, it 
is crucial to choose an appropriate null model to com¬ 
pare the data to. This situation is especially important 
for instance with regard to community analysis through 
modularity maximization for weighted networks, because 
the modularity function to be optimized [TS| needs as in¬ 
put a prediction from a null model with fixed strengths 
{Q oc j i^ij ~ ^ci,cj where {c} are the commu- 
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nity node labels associated to the optimal network par¬ 
tition). For a directed model with fixed strengths, the 
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So the resulting saddle point equations (HI are 

gout ^ ^ ^ 

where s denotes the numerical value found in the data 
for the random variable s, which particularized to each 
case reads, 


ME: 


W: 

B: 


sT^ = Mx,Y.jy, 

sf = Myj Xi 

5“ = My, E. 
s°“‘ = Mx, - 


Vi 


■iVi 
Vo 

1 i+XiVj 


= My, E 


i l+XiUj 


( 11 ) 


The ME case has an analytical solution m while the 
others must be solved computationally. Supplementary 
Material SM provides extended details about the network 
quantities computed, simulations, averaging and compu¬ 
tational methods and algorithms used in this section, 
which are available in the freely provided, open source 
ODME package m)- 

As real world datasets we use a snapshot of the World 
Trade Network (WTN), the OD matrix generated by 
Taxi trips in Manhattan for the year 2011 USD] and the 
multiplex European Airline Transportation network m- 
WTN has been vastly studied in the literature and re¬ 
cently has been represented as an aggregated system of 
M ~ 100 layers [H] representing different types of com¬ 
modities being traded. In this network, nodes represent 
countries and weights represent the amount of trade be¬ 
tween them, measured in millions of US dollars. In the 
OD Taxi dataset, which we construct as the aggrega¬ 
tion of M = 365 daily layer snapshots, each node repre¬ 
sents an intersection and each weight the number of trips 
recorded between them [ID]. Finally, in the airline net¬ 
work each node is an airport and weights correspond to 
the number of airlines providing direct connections be¬ 
tween them, so the network is an aggregation of M = 37 
binary layers (one for each airline). 

In all cases we will consider directed networks, and 
throughout this paper we will only show results in the 


outgoing direction, as the results in the incoming direc¬ 
tion are qualitatively equal. Note that the binary aggre¬ 
gated case cannot be always applied since the maximum 
number of events allocated per node pair cannot exceed 
the number of layers, and for the WTN dataset this con¬ 
dition (max({si}) < = NM) is violated for some 

nodes. 

To analyze the difference between models, we compute 
ensemble expectations for different edge and node related 
properties suitably rescaled (fixing the original strength 
distribution of each dataset) and then compare the ob¬ 
tained results with the real observed data features. 

The airline dataset is very sparse and differences be¬ 
tween models are not wide, which points out the need of 
adequate sampling for a successful analysis on weighted 
networks. Anyhow, since it is by construction an ag¬ 
gregation of binary layers, the B case displays the most 
resemblance to the data, both qualitatively and quanti¬ 
tatively (see SM). 

The WTN and Taxi datasets, in contrast, contain 
enough sampling for the wide differences between mod¬ 
els to emerge. All cases have the same number of events 
T on average, but they are not distributed among con¬ 
nections between nodes in the same way for the different 
models. Being zero the most probable value for the geo¬ 
metric distribution, for the W case with a single layer 
the connection probability initially grows distinctively 
faster than in all the other cases leading to larger number 
of binary connections between low strength nodes (Fig¬ 
ure §A ,B). Yet the higher relative fluctuations of the ge¬ 
ometric statistics also generate extremely large maximum 
weights in the tail of the existing occupation number dis¬ 
tribution (Figure]^, which are concentrated in connec¬ 
tions between high strength nodes (Figures[^C,D). Since 
the total number of events incoming and outgoing each 
node is fixed, this means that the W case has compar¬ 
atively the lowest degrees for the most weighted nodes 
despite counting the larger number of binary connections 
^ = Ynj ) = Yoi kr* = Yoj kr as can be seen in 
Figure [^A,B. 

These anomalies for low and high strength nodes re¬ 
spectively for the W case produce wild asymmetries in 
the allocation of weights per node, which can be studied 

measuring their disparity Y 2 = {j2, (Figure 

|^C,D), which quantifies how homogeneously distributed 
are the weights emerging from each node: It displays a 
U shaped form with both low and high strength nodes 
tending to very strongly concentrate their weights on few 
connections. This non-monotonic behaviour is in strong 
contrast with the one observed for the real data and 
usually in other datasets [Ij. Concerning second order 
node correlations, the outgoing weighted average neigh¬ 
bor strength (Figure again dis¬ 

plays a large range of variation for the W case (with either 
one or more than one layer) in contrast with the slight 
assortative profile of the real data, the uncorrelated pro¬ 
file of the ME case and the slight dissassortative trend of 
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FIG. 1. (Color online) Node pair statistics: Binary connection probability (top) and rescaled average edge weight (bottom) 
as function of product of origin and destination node strength. Results averaged over r = 5 • 10^ and r = 10^ realizations for 
the different models respectively with applied log-binning. The sudden increase for the binary pair-node connection probability 
can be clearly seen for the W case. 



FIG. 2. (Color online) Weight statistics: Existing node pair weight complementary cumulative distribution for the Taxi 
(left) and WTN (right) datasets. Same conditions as Figure apply. The presence of extremely large weights can be seen in 
the tails of the distributions for both the W monolayer and multilayer case. 
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the B case. This last case is caused by the combination 
of two factors: The limitation on the maximum weight 
of the edges cannot compensate (with large weights con¬ 
necting the nodes with the larger strength) the tendency 
of large nodes to be connected to a macroscopic frac¬ 
tion of the network, which is dominated by low strength 
nodes. 

Obviously none of the null models used reproduce the 
real data, however, the goal in model construction is 
rather to assess the structural impact that a given con¬ 
straint (in this case a strength distribution) has on the 
network observables. In this sense, we show that dif¬ 
ferent models provide very different insights about such 
impacts. In particular, since the Airline dataset is by 
construction an aggregated binary network and the Taxi 
dataset a Multi-Edge one (people riding Taxis are clearly 
distinguishable), the fact that the B and ME cases re¬ 
spectively lie closer to the real data comes at no surprise. 
The WTN case, however, is unclear: The modelling of 
trade transactions has not a clearly defined nature, but 
if one assumes the WTN to be a multilayered network, 
its aggregated analysis should be performed using either 
the W (with M > 1 layers) or ME case, which again are 
closer both in functional form and qualitative values to 
the real case (in contrast with mi where the W model 
with a single layer is used). 


V. CONCLUSIONS 

In this work we have showed the importance of consid¬ 
ering the nature of the events one wishes to model when 
using an integer valued network representation of a sys¬ 
tem. We have developed and solved a maximum entropy 
framework for model generation applied to networks gen¬ 
erated by aggregation [21] of multiplexes. We have 
shown how different considerations about the nature of 
the events generating the elements of the multiplex give 
rise to distinctively different node pair statistics. For the 
case where one wants to fix properties expressed as linear 
functions of the individual weights (and optionally their 
binary projection) in the network, we elegantly recovered 
well know statistics such as Poisson, Binomial, Negative 
Binomial, Geometric and Bernouilli for each case. 

We have further provided a practical example by focus¬ 
ing on the case of fixed strengths and applying the models 
to assess relevant features on three different real-world 
datasets containing different types of weights, showing 
how the role of adjacency matrix degeneracy plays a cru¬ 
cial role in model construction. To this end, we have 
made considerations about the statistical nature of the 
obtained models as well as the weaknesses and strengths 
derived from their practical applications in real cases. 
Finally, we provide the open source software package 
ODME |2I| for practitioners to apply the proposed mod¬ 
els to other datasets. 

The insights derived from this paper can open the door 
to the objective identification of truly multiplex struc¬ 


tures (except one case where it has been shown to be 
impossible) by inspection of the statistics of their edges, 
provided that several instances of a network belonging to 
the same ensemble are available. 

The take home message of this work is that in order 
to perform a meaningful analysis on a given network, a 
practitioner needs to be able to select an appropriate null 
model, which not only depends on the endogenous con¬ 
straints one considers but also on the very nature of the 
process one is modelling. Our work provides researchers 
with a range of maximum entropy (and maximum likeli¬ 
hood) models to choose from, covering a wide spectra of 
possibilities for the case of weighted networks. Each of 
this models is not wrong or even right in a general case 
despite yielding very different predictions for the same 
sets of constraints, but just more or less appropriate de¬ 
pending on the problem at hand one wants to study. 
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Appendix A: Binary constraints 


We develop hereby the general case where the con¬ 
straints have the general form fg^Uj) = Cg^Q{tij) + CgHij. 
The general derivation remains essentially unchanged 
from the linear case (main text) with a slight modifica¬ 
tion in the calculation of the explicit partition function. 


2ij — Dijitij) 


tij _ 


tii=0 


% E - A,(0) +A,(0) (Al) 


L tig—0 


Zij — 6 


n 


e 9 




Where {zij} have been redefined and the constraint on 
the total number of events T = t,, is introduced in 

their redefinition. This yields, 








% (E“=oA,(t',)4’ - A,(0)) + A,(0) 

(A2) 
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FIG. 3. (Color online) First order node statistics: Rescaled degree (top) and disparity (bottom) for the Taxi (left) and 
WTN (right) datasets. Same conditions as Figureapply. Dashed lines represent log-binned standard deviation ranges for the 
real data. The U-shaped disparity proHle is clearly seen for the W cases in sharp contrast with the monotonous behaviour of 
both the real data and the ME model. 
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FIG. 4. (Color online) Second order node statistics: Rescaled weighted average strength (bottom) for the Taxi (left) and 
WTN (right) datasets. Same conditions as Figureapply. Dashed lines represent log-binned standard deviation ranges for the 
real data. A sharp increase is clearly seen for high strength nodes in the W cases. 
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which corresponds to the zero-inflated versions (ZI) of 
the previous statistics recovered in the case fij = 
that is, asymmetric statistics where the probability of the 
first occurrence is different from the rest. Note that in 
this very general case, one can always set {c^ = OVfj} to 
include the case where only binary constraints are con¬ 
sidered). Explicitly, for the different statistics, we have. 


ME (ZIP - Zero Inflated Poisson): 


Pij{tij) — 






f • - I 


ij (f 


.M Zi- 


-i) + i 


W (ZINB - Zero Inflated Negative Binomial): 

;0(to) 


Pij{tij) — 


M - 


I" tij — 1 


% ((1 - - 1 ) + 1 


B (ZIB - Zero Inflated Binomial): 


Pij{tij) — 


((l + z,,)^-l) + l- 


(A3) 


We can then compute the binary connection statistics. 


= l-PijiO) 


ME 

W 

B 


5o((l-tZii)“-l) 
5o((l+^o)“-l) + l’ 


(A4) 


Note how the binary projection in all cases corresponds 
to Bernouilli statistics. Regarding the occupation num¬ 


ber statistics, one has explicitly, 

i^tj) — > 0) = 

' ME M - 


1-e" 


= I W M 


l-Zij (1-(1-Zij)“) 


B 


1 + Zij (1 —(l + Zij) 

{UY = {Q{u,)) (ttj) = 




^^31 

Mz,; ^ 

= W 1 


ag +ag 

R mA?_1_ 

l+Zij 


(A5) 


And we observe a clear non-trivial relation between 
binary statistics and weights, which leads to important 
correlations between degrees and strengths in networks 
belonging to these ensembles [20] which are also present 
in real data [i5] . 


Appendix B: Maximum likely hood distributions 

The probability distributions derived in this paper for 
networks belonging to the different described ensembles 
fulfil the maximum likelihood principle for networks |25j . 
Indeed, the constraint point equations in (i can be un¬ 
derstood as the equations resulting from maximizing the 
likelihood of the inverse problem of finding the set of val¬ 
ues for the Lagrange multipliers {Ot, {(^q\) that maximize 
the likelihood of the observed adjacency matrix T to be¬ 
long to each of the described models. In other words, 
defining the loglikelihood function of a network by 


/:(0T,{0J|t)=ln 


X\p^](fiT,{0q}\U 




and maximizing this expression with respect to {Ot, {Oq}) 
one is lead to the equation ([^. Explicitly, 


de^C = ^ (^b) - ^ (0(tb)» =Cq- {Cq) ({0J) = AC, 

ij 


(Bl) 


which at the critical points lead to AC, = 0 Vg and the 
condition of maximum with respect to all the variables 
is fulfilled (we also note that the problem in this form is 
concave). We thus see that the initial statement of the 


paper is confirmed: It is not enough to specify the con¬ 
straints to fully define a maximum entropy ensemble, but 
one needs also to state the nature of its elements, since 
any maximum entropy ensemble will lead to a maximum 
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likelihood description of a dataset. There is not a ’’cor¬ 
rect” ensemble to hx a given constraint, but the one that 
better describes the system that is being represented. 

Appendix C: Ensemble fluctuations 

If the maximum entropy description provided here is to 
be successful, then the fluctuations of the obtained net¬ 
works need to be bounded and have well defined statis¬ 
tics. In particular, we require that the associated entropy 
of the distribution and the statistics of the constraints 
possess finite first and second moments, and that their 
relative fluctuations around the average values need to 
be small in the limit of large sampling T. Explicitly, we 
have. 


where a = 0 for ME case, a = 1 for W case and a = — I 
for B case. We thus see that the fluctuations only disap¬ 
pear for large sampling in the linear case given by equa¬ 
tion § for the ME description. By construction, the 
constraints are extensive in the occupation numbers tij, 
thus when the number of events T grows their average 
value (Cq) must also grow [5D], yet only for the ME case 
we have oc T and thus only in this case relative fluc¬ 
tuations decay as T~^. Otherwise, the maximally ran¬ 
dom allocation of events will be made as homogeneous 
as possible among the states while preserving the con¬ 
straints, hence will in general be a large number (the 
denominator in the sum has L terms while the numera¬ 
tor has L{L — I), being L the number of available node 
pairs for the allocation) and relative fluctuations will be 
bounded and 0{M~^). Eor similar reasons, one expects 
the first term to vanish for large sampling. Eor very large 
number of layers, then the ensembles become equivalent 
to the ME case, and fluctuations vanish in the large sam¬ 
pling limit m- 

Eor the case where any binary constraint is addition¬ 
ally imposed (Appendix , the relative fluctuations of 
the binary structure dominate the statistics in the large 
sampling limit, and despite being bounded, these never 
vanish m- 

Concerning the associated Gibbs-Shannon entropy of 
the ensembles, since the occupation number statistics are 
independent, we have the random variable InP(T) = 

(associated to a given net¬ 
work instance) is a sum of independent contributions 
which in all cases studied (Poisson, Negative Binomial 


and Binomial) have well dehned first and second mo¬ 
ments when averaged over the ensemble. Hence, the 
statistics of InP(T) will be Gaussian, and no extreme 
outliers are expected. This indicates that the total av¬ 
erage number of possible network instances compatible 
with a given set of constraints is a well defined quanti¬ 
ties, and one can define a typical network structure rep¬ 
resenting the ensemble (unlike other studied cases in the 
literature [S]). 

Appendix D: Calculation of degeneracy terms 

1. Layer degeneracy: 

For each state ij out of the possible L{N) = node¬ 
pairs {N{N — I) if not accepting self-loops) one needs to 
consider the process of allocating ty events in M possible 
distinguishable levels. For the W case this corresponds 
to the um problem of placing identical balls in M dis¬ 
tinguishable urns. For the B case one faces the problem 
of selecting groups of ty < M urns out of a set of M 
urns and Anally for the ME case one must count how to 
place tij distinguishable balls in M distinguishable urns. 
These problems are well known and their solution leads 
to the second column in|lj with the product over ij rep¬ 
resenting the fact that the allocation among the layers 
for each node-pair is independent. 

2. Event degeneracy: 

For this calculation one only needs to take into ac¬ 
count the distinguishable case (otherwise there is no de¬ 
generacy). Such a case, however is controversial to ana¬ 
lyze. The correct counting of conflgurations in a Grand- 
Canonical ensemble is an issue spanning more than a 
century (see [l^ and references therein for details and ex¬ 
tended discussion), ever since Gibbs used it to establish 
the relation in classical statistical mechanics between the 
Canonical and Grand-Canonical Ensembles of an ideal 
gas. 

Grand Canonical ensembles of networks can be faced 
in many ways. The usual view is to imagine a collection 
of Af copies of a system in where to distribute F events 
in such a way that there are T = F/Af events on average 
in each copy |46] . In this framework, the probability to 
obtain a particular copy with T = tij events and a 
set of constraints {C'q(T)} reads, 

P(T) oc ^ 

The prior expression is however related to the probability 
to obtain a given configuration of T, regardless whether 
it is unique or not (several conflgurations can give rise to 
the same T). For the case of distinguishable events, there 
are (^j/j different ways of obtaining the same number of 
events T among the set of copies and T \/ tij ! ways of 




tij) 


M 1 


(Cl) 
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distributing them to obtain a given adjacency matrix T, 
hence one must consider an additional term in expression 


(Dll, 


V{T) 


Events 


F\ T\ 

t ) 


(D2) 


For the case with linear constraints of the form in equa¬ 
tion ([^, the system partition function Z reads = 


Z = J2V(T)11^ =T. 

{T} tj 


T=0 


E 


T! 


Yiij 






T=0 


= E (T) (^E = (1+^E 


T 




(D3) 


If we add the strong condition that the ensemble average number of events has to be equal to f, a scaling of M Eij 
on the total number of events F distributed among the copies M is made apparent, 


(T) =de^\nZ = Y, iUj) = F 


ME 


ij 


1 + M Eij Zij 


= T 


M = 


TjF 

1-fjF' 


(D4) 


Wrapping together the previous expressions and considering that the number of copies is arbitrary, one can imagine 
the limit where it goes to infinity keeping T constant. This amounts to consider F infinitely large too. 


Z = lim 

F—^OC 



1-T/F J 


ij ij 


(D5) 


which leads to a factorizable partition function of the form in equation 0 which does not depend on the number of 
copies of the system, as neither does its associated probability. 




^-T 


F—¥00 


F—¥(X> 


(l-l) 


n 




-e-^e-^+^ 




• I 


F—>oo \ F — T 


F-T 


(D6) 


We have thus reached the same independent Poisson 
probabilities as the ones obtained by taking an effective 
factorizable degeneracy term T! Y\ij{M*'^/tij') (see equa¬ 
tion 0). 

These results are in accordance with previous 
works |2()j where the complete equivalence between 
Canonical (ensembles with soft linear constraints as in 
equation but with T = T fixed for every network in 
the ensemble) and Micro-canonical ensembles (ensembles 
where all constraints are exactly fulfilled) of Multi-Edge 
networks was proven. The equivalence between Micro- 
canonical ensembles and Grand-canonical ensembles in 


Poisson form has also been validated by simulations for 
the case where strengths are fixed m- 

For non linear cases (such as the case with binary con¬ 
straints or the B case with distinguishable events), the 
effective degeneracy term is an approximation, since the 
complete calculation using partial sums where T is ex¬ 
actly fixed cannot be performed. Approximating T! by T\ 
amounts to consider that the possible fluctuations of the 
macroscopic variable T are caused by the state indepen¬ 
dent fluctuations of the microscopic structure given by 
{tij}. This however leads to the same statistics emerging 
from the leading order terms in T of the system parti¬ 
tion function computed using a Micro-canonical formal¬ 
ism (see pn]l. 
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Role of adjacency matrix degeneracy in maximum-entropy-weighted network models: 

Supplementary Material 


Appendix E: Datasets 

1. Taxi data 

The taxi data are the same as those used in other studies El SO], which represents the aggregated number of taxi 
trips between road intersections performed within the borough of Manhattan for the year 2012. Each node is an 
intersection {N = 4090) and T = tij — 150 • 10® trips are generated in this interval. The network contains 
a negligible fraction of self-loops and is directed. For the aggregation of multiplexes, we have considered that the 
network contains M = 365 different daily temporal snapshots. 


2. WTN 

The data for the World Trade Network have been obtained from m- We consider the aggregated snapshot for 
the trade (counted in millions of dollars) for the year 2002 between N = 184 countries appearing in the dataset, 
corresponding to M = 100 different trade commodities considered as layers. Since the dataset is incomplete, we have 
proceeded as follows for each pair of countries: If a flow is missing in both directions, we ignore the edge tij = tji = 0, 
furthermore, if only one direction is missing, the non-missing value is copied for both directions. Finally, a threshold 
of tmin = 1 is applied and all values are truncated to the nearest integer to enforce the condition tij € NVi,j. The 
network is directed and does not contain self loops nor nodes for which the incoming and outgoing degrees are zero. 


3. Multiplex European Airlines 

The data for the aggregated multiplex of flight connections between European cities have been obtained from (sa¬ 
lt has iV = 417 nodes which represent airports and weights represent the existence of a connection by a given airline 
between two airports. The layers are thus the M = 37 different airlines present in the dataset, and the adjacency 
matrix has been obtained by aggregating all the binary layers, ty = network is undirected but is 

represented as directed {tij = tji\/i,j) and does not contain self loops nor nodes for which the degrees are zero. 

The strength distribution for all the datasets is shown in figure and network quantities are summarized in table 

imi 


Dataset 

N 

s E/L = E, 

T/L 

t+ 

Sout 

Self-loops 

Taxis 

4090 

35569.357 0.433 

8.697 

1.0 

1.5 

0.7 Yes 

WTN 

184 

34350.603 0.323 

187.708 

10.0 

2.8 

0.8 No 

Airlines 

417 

17.206 0.034 

0.041 

1.3 

1.6 

1.5 No 


TABLE III. Network details on the used datasets. The symbol x = N ^ Xi indicates the graph-average of quantity x, 
L = N{N — 1) for the self-loops allowed case and L — N’^ otherwise, see section]^ for details on each quantity. 


Appendix F: Measured quantities 

In the paper, we compute the differences between models for different first and second order rescaled network 
metrics, detailed in the following (we only present the outgoing version of each quantity): 

• Scaled binary degrees: Represents the percentage of the network connected by at least a single event to a given 
node. 


ki 

N 


3 
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S 



FIG. 5. (color online) Outgoing strength and existing weight distribution for the considered datasets smoothed with log binning. 
All distributions are fat-tailed except the existing weight for the Airline dataset, which is bounded by the value M = 37 by 
construction. 


• Non-zero weight distribution: Distribution of occupation numbers per existing links. 


m = 




• Scaled graph-average weight and graph-average binary connection probability of edges as a function of product of 
incoming and outgoing strengths: We use this metric to analyze the correlations between the binary connection 
probability and the average weight of links, as a function of the ’’importance” of each link as measured by the 
product of the nodes’ strengths. 


T TNsN,, 


e(t)(S°^‘s' ) = 


1 




b|s““*=s.s)"=s' 


ij\s?“*=s,e‘"=s 


• Disparity: This bounded quantity 12,* G 1] indicates the homogeneity in the distribution of events 

among the edges emerging from a given node. The higher this value, the more concentrated the events are on 
a predominant link. 




• Scaled average neighbor weighted strength: This quantity indicates the average strength of the neighbors of 
a nodes, weighted by the events of each connection. In this version, we have rescaled the quantity by the 
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Dataset 

Case 

Sk ± (^k 

SY2 i 



CPC 


ME 


-0.23 ± 2.72 

-0.01 ±0.11 

-1.93 ±27.96 

-3.64 ± 19.59 

0.329 

Airlines 

B (M 

= 37) 

-0.18 ±2.65 

-0.01 ±0.11 

-1.60 ±27.94 

-3.13 ± 19.58 

0.329 


W (M 

= 37) 

-0.28 ± 2.80 

-0.01 ±0.11 

-2.27 ± 28.00 

-4.14 ± 19.62 

0.329 


W (M 

= 1) 

-1.64 ±5.75 

-0.01 ±0.11 

-9.26 ±28.82 

-14.42 ±20.18 

0.323 


ME 


250.91 ±201.88 

-0.00 ±0.01 

26782 ± 32260 

491 ± 422 

0.649 

Taxis 

B (M 

= 365) 

159.72 ± 192.03 

0.00 ±0.01 

84231 ± 80803 

465 ± 474 

0.639 


W (M 

= 365) 

283.80 ± 216.95 

-0.00 ±0.01 

14147 ±31352 

495 ± 403 

0.643 


W (M 

= 1) 

137.19 ± 255.48 

-0.00 ±0.01 

-14341 ±27504 

134 ± 289 

0.624 


ME 


42.24 ± 29.42 

-0.12 ±0.18 

59 • 10® ± 202 • 10® 

31.86 ± 17.55 

0.62 

WTN 

B (M 

= 100) 

- 

- 

- 

- 



W (M 

= 100) 

70.47 ±45.54 

-0.17 ±0.19 

-194- 10® ±224- 10® 

45.28 ± 23.50 

0.53 


W (M 

= 1) 

63.48 ±45.16 

-0.16 ±0.19 

-212- 10® ±227- 10® 

35.08 ±24.09 

0.53 


TABLE IV. Graph average differences between model and data quantities. Bold letters indicate the best performing model as 
indicated by the CPC index. 


uncorrelated expectation ^ 


uncorr 




s Y.j 


in 

j 


qOuI 


• Scaled average neighbor degree: These quantities represents node degree correlations, at the binary connection 
level. 


kr^ 

N 


• Sorensen Common part of Commuters index: This indicator is used to compute similarity between weighted 
matrices |3H] and we use the version (less prone to be affected by sampling) appearing in [2. 


CPCmodel = 2 


E * /X /j. \inodcl\ 

E r I /i \ model 

ij "T \tijj 


Some of these cases have the issue of low-strength nodes which do not appear in every simulation (s = 0 or fc = 0), 
leading to an undefined value of such quantities. Hence to average over an ensemble for this cases, we only compute 
the conditioned mean on existing cases. 


Appendix G: Additional comparison between models and data 

We provide in this section additional details in the comparison between models and data. Table |IV| presents the 

_ _ I—I , 

graph-average error eT = N~^ ~ i^i)) graph-average standard deviation {Xi - {xi)Y) for 

different quantities displayed in the main text. Figure]^ also shows the figures corresponding to the Airlines dataset. 

The model values have been computed averaging over r = 10^ realizations for the WTN and Airline cases and 
r = 500 for the Taxis respectively. 


Appendix H: Solving the saddle point equations 

The only caveat related with the maximum entropy ensembles of networks is the fact that one needs to solve the 
saddle point equations related with each particular ensemble one has chosen. 
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FIG. 6. Node and edge related metrics for the Airlines dataset over r = 10^ realizations, see section for details on the 
measured quantities. In this case, due to poor sampling, differences between ensembles are not clearly appreciable, yet the W 
case is clearly distinct from the others and from the real data, being also the B case the one closer to the empirical data. 
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Recovering the expressions in Appendix 2 of the main text, 


de^£ = J2ei{U,-{Uj))=AC, 


£ = - 51 = -4,c,, 


(HI) 


we see that the concavity of the function to solve is assured in all the cases, yet the method to solve each system 
of equations will depend on the specihcs of the problem at hand. In the general case, the difficulty of the problem 
depends on the number of constraints and varies with the particular restrictions of the considered case. 

For the case of hxed strengths, one can approach this problem by maximizing the likelihood associated to every 
model, compared to the actual strength list provided Hence, we need to solve the N associated {x,y} 

values considering Zij = Xit/j. So we have. 






= 6, 




(H2) 


In the following, we provide the details on the computational methods used to solve the saddle point equations, 
which for the particular case of fixed strengths discussed in the paper, is implemented in the freely available, open 
source package ODME [27]. 


1. Multi Edge case 


This scenario is fully analytical m for the case where self-loops are accepted. Otherwise, the statistic is Poisson 
so (tij) = = XiUj and one has the only restriction that Xi >0,yi> OVi, 
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(H3) 


Q /^ME _ 


The saddle point equation can be then safely solved using algorithm [^ with 


F({x,2/},{s°“*,s“}) = 


r^n+l _ 

■^9 M Ej y" 

7;"+l = 

^9 


(H4) 


Data: Constraints list {Cq}, tolerance on constraints eg, tolerance on variables Sz and initial guess { 2 *"'}. 
Result: Set of Lagrange multipliers {Zq} 

Set 2 ° = 2 j"b zl = 0 Vg G [1...0]. Set n = 0.; 
while maxIC, — {Cq) ({ 2 ”})! > eg or max| 2 "+^ — 2 "| > ez do 
Z-+^ = Fq{{z-},{Cq}y, 
n = n -\-l 

end 

Algorithm 1: Balancing algorithm to solve saddle point equations for ME and B cases. 
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2. Binary case 

In this case the restriction is analogous to the previous one and we have, 





Although the problem cannot be shown to be strictly concave in this form, the saddle point equations can again be 
solved using algorithm with the relation 


A({x,2/},{r“‘,s“}) = 


Some convergence problems can be encountered using the algorithm if the larger values of the strength approach 
the limit max{s°“‘,s™} = M, but in this case a simple, positively bounded, unconstrained gradient descent method 
has been implemented to solve the problem. 


= 


yT^ = 













(H6) 


3. Weighted case 

The weighted case is considerably more complicated than the previous ones, since it includes the restriction that 
0 < ccij/j < lVi,j and hence the maximization is performed on a non-convex domain. The balancing approach 
(algorithmic is then not satisfactory, since there is no explicit enforcement for the values {x,y} to remain in the 
domain of the Loglikelihood function one wants to maximize. The scalar function being considered is 

= X(M,{4})+M^ln(l-x,y,)+^sr‘lnx, + ^sf Iny, (H7) 

u' ^ 3 

with derivatives, 



f) nW _ _r _ _ _ 


subject to the conditions that, 

Q<Xryj<iyi,j (H9) 

In principle, the problem is concave, and thus finding a solution to the saddle point equations gives the global 
maximum. Sadly, for real cases this concavity is lost as soon as the explicit domain constraint is included 0 < Xiyj < 1. 
Hence, there is no general algorithm that can be applied with assured results, and obtaining a solution to the saddle 
point equations will not be guaranteed in all cases (specially for large iV or a very skewed distribution of strengths). 

We thus deal with a large scale, non-concave (non-convex), bounded and constrained maximization (minimization) 
problem. 
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a. Preconditioning 

Basically, the difficulty of the problem derives from the form of the strength sequence s®"}. The more skewed 

this distribution is, the more difficult the problem is to solve, for a given fixed N. For easy problems, a good way to 
pre-condition the problem is to first solve the easier, bounded, unconstrained problem of finding, 

min [-/:r(x)] 

0 < Xi < a i G 

1 (HIO) 

0 < J/j < a j G 

a G M+. 

The problem of this method is that it does not consider all the available phase space (see figure [7]) : The solution lies 
in the hyper-volume defined by the axis and ymax(xmax) = which is a larger volume than that defined by the 

axis and Xmax < Oi,ymax < a~^. Usually, a good choice is a = 1. If the distribution of of strengths is very skewed, 
the optimal solution most likely lies outside the second area, but the suboptimal solution within this region serves as 
preconditioning for the complete maximization problem thanks to the convexity of the function (without considering 
the domain). 

We have implemented this preconditioning procedure using a truncated Newton TNG method [49] from the Scipy 
suite [50] . 



FIG. 7. Sketch of a plane projection of the phase space hyper volume with a = 1.5 for the maximization problem in the W case. 
The preconditioning method looks for solutions inside the area delimited by the green rectangle. The orange area represents 
the domain of the function, which is clearly non-convex. 


h. Constrained problem 


Since the loglikelihood function is not defined outside of the domain, we use an interior point method to solve 
the problem adding L non-linear inequalities of the form 0 < Xiyj < I (L = for the case without selfloops and 
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L = N{N — 1) otherwise). The implementation is done in CVXOPT |51) . but it has an obvious limitation given 
by the use of memory, which grows very fast with the number of nodes of the given network. Additionally, as early 
mentioned, the convergence of the algorithm is not assured due to the non-convexity of the complete problem, yet in 
our cases we obtained very satisfactory cases for the different cases analyzed. 


4. Precision 


For all the cases considered, we analyze the precision of our solving approach by computing the euclidean norm of 
the absolute error and the maximum of the absolute relative error among the nodes, 

= = e_.=max^. (Hll) 

The resulting values for each example are reported in table |V] 


Dataset 

N 

C&SG Smax 

Airlines 

417 

ME 4-10“^M0-10"^® 

B(M = 37) 9-10“^^ 4-10"^"^ 

W (M = 37) 1 • 10“® 1 • 10“^^ 
W(M=1) 2-10“® 3-10"^^ 

Taxis 

4090 

ME 1-10“® 6 •10“^'^ 

B(M = 365) I-IO”’’ I-IO”^® 
W (M = 365) 7-10“® 5-10"^^ 
W(M= 1) 0.05 4-10"® 

WTN 

188 

ME 5 • 10“^° 2 • 10"^® 

B (M = 100) - 
W (M= 100) 3- 10“® 5- 10"® 
W(M=1) 7'10“'‘ 1-10"^ 


TABLE V. Results of the maximization problem. Norms associated to the best solution for each dataset and synthetic data. 
Missing values for Binary case indicate Smax > MN, hence the model is not applicable. 


Appendix I: Availability of code and Data 

The basic algorithms and codes used in this paper are all part from the freely available Origin Destination Multi 
Edge analysis package (ODME) [57j. The data for the WTN has been obtained from |Hlj, the airline network data 
from m and the Taxi data has been obtained from the New York Taxi and Limousine Commission via Freedom of 
Information Law request. 















